Skip to content

chore(release): add multi-platform release JAR build pipeline#87

Merged
andygrove merged 8 commits into
apache:mainfrom
andygrove:multi-platform-release
May 25, 2026
Merged

chore(release): add multi-platform release JAR build pipeline#87
andygrove merged 8 commits into
apache:mainfrom
andygrove:multi-platform-release

Conversation

@andygrove
Copy link
Copy Markdown
Member

Which issue does this PR close?

Rationale for this change

PR #77 added the runtime side of the fat-JAR design: NativeLibraryLoader reads org/apache/datafusion/<os>/<arch>/lib<datafusion_jni>.<ext> from the JAR and extracts it on demand. But nothing in the repo actually produces a JAR with more than one platform's lib inside — core/pom.xml's host-activated profile bundles only the host's lib, and the existing release scripts (dev/release/create-tarball.sh etc.) only do source tarballs. A consumer pulling the artifact from Maven Central today gets a JAR that works on the platform it was built on and nowhere else.

This PR adds the build side: a release-manager script that drives two Docker containers for the Linux arches and the host's own Rust toolchain for the macOS arches, assembles all four .so/.dylib files into a single JAR, and installs it into a temporary local Maven repo. A second script (not yet exercised in CI) signs and uploads that repo to Apache Nexus staging.

The structure mirrors datafusion-comet's release tooling (dev/release/comet-rm/Dockerfile, build-release-comet.sh, publish-to-maven.sh), simplified for this project: single module pair (no Spark/Scala matrix) and macOS libs built natively on the RM's macOS host (no OSXCross / Xcode SDK plumbing).

What changes are included in this PR?

New release tooling under dev/release/:

  • datafusion-java-rm/Dockerfile — Ubuntu 20.04 + Rust + protoc (arch-aware download). Single-stage, no OSXCross. Built twice via --platform=linux/{amd64,arm64}.
  • datafusion-java-rm/build-native-libs.sh — runs inside the container: clones the repo, cargo build --release. Container platform dictates target arch.
  • build-release.sh — host orchestrator. Detects host arch via uname -m. Cleans + rebuilds both Docker images, runs each, docker-cp's the linux libs into core/target/classes/org/apache/datafusion/linux/{amd64,aarch64}/, then on the host runs cargo build --release (host arch native) and cargo build --release --target <other>-apple-darwin (cross to the other arch) and copies both .dylib files into core/target/classes/org/apache/datafusion/darwin/{x86_64,aarch64}/. Finishes with ./mvnw -Ddatafusion.native.profile=release -DskipTests install into a temp local Maven repo whose path is printed at the end. Pre-cleans leftover named containers; traps SIGINT/SIGTERM/EXIT.
  • publish-to-maven.sh — Nexus staging upload: creates staging repo via REST, signs every artifact with GPG, uploads with curl --fail-with-body, closes the staging repo. Not exercised in the dry run.
  • README.md — new "Binary Release: Multi-Platform JAR" section after the existing source-tarball flow.

Two follow-on fixes uncovered by the dry run:

  • pom.xml — add .github/** and dev/release/rat_exclude_files.txt to the apache-rat-plugin excludes. The plugin runs at the verify phase, which mvn install triggers but make test does not. These files were already exempt in dev/release/rat_exclude_files.txt (used by the source-tarball flow's check-rat-report.py); this brings the pom-level RAT check into alignment.
  • dev/release/build-release.sh — pass -Ddatafusion.native.profile=release to ./mvnw install. core/pom.xml defaults this property to debug, so without the override the antrun copy-native-lib step looks for the host's lib under native/target/debug/ and fails the <fail> precondition check.

Are these changes tested?

End-to-end dry run on macOS aarch64 produced datafusion-java-0.1.0-SNAPSHOT.jar (175 MB) containing exactly the four expected resource entries:

org/apache/datafusion/linux/amd64/libdatafusion_jni.so       ELF 64-bit LSB x86-64
org/apache/datafusion/linux/aarch64/libdatafusion_jni.so     ELF 64-bit LSB ARM aarch64
org/apache/datafusion/darwin/x86_64/libdatafusion_jni.dylib  Mach-O 64-bit x86_64
org/apache/datafusion/darwin/aarch64/libdatafusion_jni.dylib Mach-O 64-bit arm64

./mvnw test is green on the branch (308 tests run, 0 failures, 13 skipped) after the standard `cargo build` (debug) precondition from CLAUDE.md.

`publish-to-maven.sh` is not exercised in this PR. Validating it requires real Apache Nexus credentials and a real GPG-signed release candidate, both of which are out of scope for the build-pipeline dry run.

Are there any user-facing changes?

No code or API changes. Release-manager-facing tooling only.

andygrove added 8 commits May 25, 2026 09:26
The apache-rat-plugin runs at the `verify` phase, which surfaces
during `mvn install` (the binary release flow) but not during the
`make test` lifecycle. Without these exclusions, the PR / issue
templates and the source-tarball exclude list itself fail the
license-header check.

The source-tarball flow already considers these files exempt via
dev/release/rat_exclude_files.txt; this brings the pom's plugin
configuration into alignment.
`core/pom.xml` defaults `datafusion.native.profile=debug`, so the
antrun copy-native-lib step expects a debug-mode dylib at
`native/target/debug/...`. The release orchestrator builds with
`cargo build --release`, which writes to `native/target/release/...`.

Pass `-Ddatafusion.native.profile=release` so the antrun check
looks at the right path and the produced JAR bundles the release
native libraries.
Address review feedback on the multi-platform JAR pipeline:

- build-release.sh: pre-clean leftover builder containers before the
  first `docker run`. Without this, a prior SIGKILL'd run leaves named
  containers behind and the next invocation fails immediately on
  `docker run --name` with a name conflict.
- publish-to-maven.sh: pass `--fail-with-body` to the upload-loop and
  close-staging curl calls. curl exits 0 on HTTP 4xx/5xx by default,
  which would silently let an unauthorised or rejected upload run to
  completion and corrupt the staging repo.
- Dockerfile: fold gcc-10 / g++-10 / cpp-10 into the main
  `apt-get install --no-install-recommends -y` layer so the apt cache
  stays consistent and the layer count is reduced.
- README: document `xmllint` as a release-manager prerequisite (used
  by publish-to-maven.sh to parse Nexus staging responses).
@andygrove
Copy link
Copy Markdown
Member Author

@pgwhalen @LantaoJin fyi

@andygrove andygrove merged commit 76a5454 into apache:main May 25, 2026
1 check passed
@andygrove andygrove deleted the multi-platform-release branch May 25, 2026 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants